The search functionality is under construction.

Author Search Result

[Author] Aki KOBAYASHI(60hit)

41-60hit(60hit)

  • Data-Parallel Volume Rendering with Adaptive Volume Subdivision

    Kentaro SANO  Hiroyuki KITAJIMA  Hiroaki KOBAYASHI  Tadao NAKAMURA  

     
    PAPER-Computer Graphics

      Vol:
    E83-D No:1
      Page(s):
    80-89

    A data-parallel processing approach is promising for real-time volume rendering because of the massive parallelism in volume rendering. In data-parallel volume rendering, local results processing elements(PEs) generate from allocated subvolumes are integrated to form a final image. Generally, the integration causes an overhead unavoidable in data-parallel volume rendering due to communications among PEs. This paper proposes a data-parallel shear-warp volume rendering algorithm combined with an adaptive volume subdivision method to reduce the communication overhead and improve processing efficiency. We implement the parallel algorithm on a message-passing multiprocessor system for performance evaluation. The experimental results show that the adaptive volume subdivision method can reduce the overhead and achieve higher efficiency compared with a conventional slab subdivision method.

  • Slot-Array Receiving Antennas Fed by Coplanar Waveguide for 700 GHz Submillimeter-Wave Radiation

    Hiroaki KOBAYASHI  Yasuhiko ABE  Yoshizumi YASUOKA  

     
    PAPER-Phased Arrays and Antennas

      Vol:
    E82-C No:7
      Page(s):
    1248-1252

    Thin-film slot-array receiving antennas fed by coplanar waveguide (CPW) were fabricated on fused quartz substrates, and the antenna properties were investigated at 700 GHz. It was confirmed that the transmission efficiency of CPW was 0.83/λm, and the rate of radiated power from a slot antenna was 0.5 at 700 GHz. The fabricated antennas worked as expected from the theory based on the transmission line model, and the two-dimensional 83 slot-array antenna fed by CPW increased the power gain by 11 dB over a single-slot antenna. The power gain of the antenna was 13 dBi and the aperture efficiency was 40% when the 700 GHz-submillimeter wave was irradiated through the substrate.

  • Acceleration Techniques for the Network Inversion Algorithm

    Hiroyuki TAKIZAWA  Taira NAKAJIMA  Masaaki NISHI  Hiroaki KOBAYASHI  Tadao NAKAMURA  

     
    LETTER-Bio-Cybernetics and Neurocomputing

      Vol:
    E82-D No:2
      Page(s):
    508-511

    We apply two acceleration techniques for the backpropagation algorithm to an iterative gradient descent algorithm called the network inversion algorithm. Experimental results show that these techniques are also quite effective to decrease the number of iterations required for the detection of input vectors on the classification boundary of a multilayer perceptron.

  • A New Linear Prediction Filter Based Adaptive Algorithm For IIR ADF Using Allpass and Minimum Phase System

    James OKELLO  Yoshio ITOH  Yutaka FUKUI  Masaki KOBAYASHI  

     
    PAPER-Digital Signal Processing

      Vol:
    E81-A No:1
      Page(s):
    123-130

    An adaptive infinite impulse response (IIR) filter implemented using an allpass and a minimum phase system has an advantage of its poles converging to the poles of the unknown system when the input is a white signal. However, when the input signal is colored, convergence speed deteriorates considerably, even to the point of lack of convergence for certain colored signals. Furthermore with a colored input signal, there is no guarantee that the poles of the adaptive digital filter (ADF) will converge to the poles of the unknown system. In this paper we propose a method which uses a linear predictor filter to whiten the input signal so as to improve the convergence characteristic. Computer simulation results confirm the increase in convergence speed and the convergence of the poles of the ADF to the poles of the unknown system even when the input is a colored signal.

  • A Capacity-Aware Thread Scheduling Method Combined with Cache Partitioning to Reduce Inter-Thread Cache Conflicts

    Masayuki SATO  Ryusuke EGAWA  Hiroyuki TAKIZAWA  Hiroaki KOBAYASHI  

     
    PAPER-Computer System

      Vol:
    E96-D No:9
      Page(s):
    2047-2054

    Chip multiprocessors (CMPs) improve performance by simultaneously executing multiple threads using integrated multiple cores. However, since these cores commonly share one cache, inter-thread cache conflicts often limit the performance improvement by multi-threading. This paper focuses on two causes of inter-thread cache conflicts. In shared caches of CMPs, cached data fetched by one thread are frequently evicted by another thread. Such an eviction, called inter-thread kickout (ITKO), is one of the major causes of inter-thread cache conflicts. The other cause is capacity shortage that occurs when one cache is shared by threads demanding large cache capacities. If the total capacity demanded by the threads exceeds the actual cache capacity, the threads compete to use the limited cache capacity, resulting in capacity shortage. To address inter-thread cache conflicts, we must take into account both ITKOs and capacity shortage. Therefore, this paper proposes a capacity-aware thread scheduling method combined with cache partitioning. In the proposed method, inter-thread cache conflicts due to ITKOs and capacity shortage are decreased by cache partitioning and thread scheduling, respectively. The proposed scheduling method estimates the capacity demand of each thread with an estimation method used in the cache partitioning mechanism. Based on the estimation used for cache partitioning, the thread scheduler decides thread combinations sharing one cache so as to avoid capacity shortage. Evaluation results suggest that the proposed method can improve overall performance by up to 8.1%, and the performance of individual threads by up to 12%. The results also show that both cache partitioning and thread scheduling are indispensable to avoid both ITKOs and capacity shortage simultaneously. Accordingly, the proposed method can significantly reduce the inter-thread cache conflicts and hence improve performance.

  • A Fast Ray-Tracing Using Bounding Spheres and Frustum Rays for Dynamic Scene Rendering

    Ken-ichi SUZUKI  Yoshiyuki KAERIYAMA  Kazuhiko KOMATSU  Ryusuke EGAWA  Nobuyuki OHBA  Hiroaki KOBAYASHI  

     
    PAPER-Computer Graphics

      Vol:
    E93-D No:4
      Page(s):
    891-902

    Ray tracing is one of the most popular techniques for generating photo-realistic images. Extensive research and development work has made interactive static scene rendering realistic. This paper deals with interactive dynamic scene rendering in which not only the eye point but also the objects in the scene change their 3D locations every frame. In order to realize interactive dynamic scene rendering, RTRPS (Ray Tracing based on Ray Plane and Bounding Sphere), which utilizes the coherency in rays, objects, and grouped-rays, is introduced. RTRPS uses bounding spheres as the spatial data structure which utilizes the coherency in objects. By using bounding spheres, RTRPS can ignore the rotation of moving objects within a sphere, and shorten the update time between frames. RTRPS utilizes the coherency in rays by merging rays into a ray-plane, assuming that the secondary rays and shadow rays are shot through an aligned grid. Since a pair of ray-planes shares an original ray, the intersection for the ray can be completed using the coherency in the ray-planes. Because of the three kinds of coherency, RTRPS can significantly reduce the number of intersection tests for ray tracing. Further acceleration techniques for ray-plane-sphere and ray-triangle intersection are also presented. A parallel projection technique converts a 3D vector inner product operation into a 2D operation and reduces the number of floating point operations. Techniques based on frustum culling and binary-tree structured ray-planes optimize the order of intersection tests between ray-planes and a sphere, resulting in 50% to 90% reduction of intersection tests. Two ray-triangle intersection techniques are also introduced, which are effective when a large number of rays are packed into a ray-plane. Our performance evaluations indicate that RTRPS gives 13 to 392 times speed up in comparison with a ray tracing algorithm without organized rays and spheres. We found out that RTRPS also provides competitive performance even if only primary rays are used.

  • A Multiple Block-matching Step (MBS) Algorithm for H.26x/MPEG4 Motion Estimation and a Low-Power CMOS Absolute Differential Accumulator Circuit

    Tadayoshi ENOMOTO  Nobuaki KOBAYASHI  Tomomi EI  

     
    PAPER-Digital

      Vol:
    E90-C No:4
      Page(s):
    718-726

    To drastically reduce the power dissipation (P) of an absolute difference accumulation (ADA) circuit for H.26x/MPEG4 motion estimation, a fast block-matching (BM) algorithm called the Multiple Block-matching Step (MBS) algorithm has been developed. The MBS algorithm can drastically improve the block matching speed, while achieving the same visual quality as that of a full search (FS) BM algorithm. Power dissipation (P) of a 0.18-µm CMOS absolute difference accumulator (ADA) circuit employing the MBS algorithm is significantly reduced to the range of about 0.3% to 12% that of the same ADA circuit adopting FS.

  • Dynamic Activating and Deactivating Loss Recovery Router for Live Streaming Multicast

    Yuthapong SOMCHIT  Aki KOBAYASHI  Katsunori YAMAOKA  Yoshinori SAKAI  

     
    PAPER-Network

      Vol:
    E89-B No:5
      Page(s):
    1534-1544

    Live streaming is delay sensitive and can tolerate some amount of loss. The QoS Multicast for Live Streaming (QMLS) Protocol, focuses on the characteristics of live streaming. It has been shown to improve the performance of live streaming multicast by reducing the end-to-end packet loss probability. However, the placement of active routers performing the QMLS function has not been discussed. This paper proposes a dynamic method to activate and deactivate routers in order to minimize the number of active routers for each QMLS-packet flow and discusses its parameters. The results of an evaluation show that the proposed method can reduce the number of active routers for each flow and adjust the active routers according to changes in the multicast tree.

  • Low Dynamic Power and Low Leakage Power Techniques for CMOS Motion Estimation Circuits

    Nobuaki KOBAYASHI  Tomomi EI  Tadayoshi ENOMOTO  

     
    PAPER-Low Power Techniques

      Vol:
    E89-C No:3
      Page(s):
    271-279

    To drastically reduce the dynamic power (PAT) and the leakage power (PST) of the CMOS MPEG4/H.264 motion estimation (ME) circuits, several power reduction techniques were developed. They were circuit architectures, which were able to reduce the supply voltages (VDD) and numbers of logic gates of not only the whole circuit but the critical path, a fast motion estimation algorithm, and a leakage current reduction circuit. A 0.18-µm CMOS ME circuit has been fabricated by adopting those techniques. At a clock frequency of 160 MHz and VDD of 1.25 V, PAT decreased to 75.9 µW, which was 5.35% that of a conventional ME circuit. PST also decreased to 0.82 nW, which was 3.93% that of the conventional ME circuit.

  • A Low Power Multimedia Processor Implementing Dynamic Voltage and Frequency Scaling Technique and Fast Motion Estimation Algorithm Called “Adaptively Assigned Breaking-Off Condition (A2BC)”

    Tadayoshi ENOMOTO  Nobuaki KOBAYASHI  

     
    PAPER

      Vol:
    E96-C No:4
      Page(s):
    424-432

    A motion estimation (ME) multimedia processor was developed by employing dynamic voltage and frequency scaling (DVFS) technique to greatly reduce the power dissipation. To make full use of the advantages of DVFS technique, a fast motion estimation (ME) algorithm was also developed. It can adaptively predict the optimum supply voltage and the optimum clock frequency before ME process starts for each macro-block for encoding. Power dissipation of the 90-nm CMOS DVFS controlled multimedia processor, which contained an absolute difference accumulator as well as a small on-chip DC/DC level converter, a minimum value detector and DVFS controller, was reduced to 38.48 µW, which was only 3.261% that of a conventional multimedia processor.

  • Complex-Valued Bipartite Auto-Associative Memory

    Yozo SUZUKI  Masaki KOBAYASHI  

     
    PAPER-Nonlinear Problems

      Vol:
    E97-A No:8
      Page(s):
    1680-1687

    Complex-valued Hopfield associative memory (CHAM) is one of the most promising neural network models to deal with multilevel information. CHAM has an inherent property of rotational invariance. Rotational invariance is a factor that reduces a network's robustness to noise, which is a critical problem. Here, we proposed complex-valued bipartite auto-associative memory (CBAAM) to solve this reduction in noise robustness. CBAAM consists of two layers, a visible complex-valued layer and an invisible real-valued layer. The invisible real-valued layer prevents rotational invariance and the resulting reduction in noise robustness. In addition, CBAAM has high parallelism, unlike CHAM. By computer simulations, we show that CBAAM is superior to CHAM in noise robustness. The noise robustness of CHAM decreased as the resolution factor increased. On the other hand, CBAAM provided high noise robustness independent of the resolution factor.

  • Uniqueness Theorem of Complex-Valued Neural Networks with Polar-Represented Activation Function

    Masaki KOBAYASHI  

     
    PAPER-Nonlinear Problems

      Vol:
    E98-A No:9
      Page(s):
    1937-1943

    Several models of feed-forward complex-valued neural networks have been proposed, and those with split and polar-represented activation functions have been mainly studied. Neural networks with split activation functions are relatively easy to analyze, but complex-valued neural networks with polar-represented functions have many applications but are difficult to analyze. In previous research, Nitta proved the uniqueness theorem of complex-valued neural networks with split activation functions. Subsequently, he studied their critical points, which caused plateaus and local minima in their learning processes. Thus, the uniqueness theorem is closely related to the learning process. In the present work, we first define three types of reducibility for feed-forward complex-valued neural networks with polar-represented activation functions and prove that we can easily transform reducible complex-valued neural networks into irreducible ones. We then prove the uniqueness theorem of complex-valued neural networks with polar-represented activation functions.

  • Hybrid Quaternionic Hopfield Neural Network

    Masaki KOBAYASHI  

     
    PAPER-Nonlinear Problems

      Vol:
    E98-A No:7
      Page(s):
    1512-1518

    In recent years, applications of complex-valued neural networks have become wide spread. Quaternions are an extension of complex numbers, and neural networks with quaternions have been proposed. Because quaternion algebra is non-commutative algebra, we can consider two orders of multiplication to calculate weighted input. However, both orders provide almost the same performance. We propose hybrid quaternionic Hopfield neural networks, which have both orders of multiplication. Using computer simulations, we show that these networks outperformed conventional quaternionic Hopfield neural networks in noise tolerance. We discuss why hybrid quaternionic Hopfield neural networks improve noise tolerance from the standpoint of rotational invariance.

  • FLEXII: A Flexible Insertion Policy for Dynamic Cache Resizing Mechanisms

    Masayuki SATO  Ryusuke EGAWA  Hiroyuki TAKIZAWA  Hiroaki KOBAYASHI  

     
    PAPER

      Vol:
    E98-C No:7
      Page(s):
    550-558

    As energy consumption of cache memories increases, an energy-efficient cache management mechanism is required. While a dynamic cache resizing mechanism is one promising approach to the energy reduction of microprocessors, one problem is that its effect is limited by the existence of dead-on-fill blocks, which are not used until their evictions from the cache memory. To solve this problem, this paper proposes a cache management policy named FLEXII, which can reduce the number of dead-on-fill blocks and help dynamic cache resizing mechanisms further reduce the energy consumption of the cache memories.

  • An Adaptive Algorithm for Cascaded Notch Filter with Reduced Bias

    James OKELLO  Shin'ichi ARITA  Yoshio ITOH  Yutaka FUKUI  Masaki KOBAYASHI  

     
    PAPER-Digital Signal Processing

      Vol:
    E84-A No:2
      Page(s):
    589-596

    In this paper we propose a new simplified algorithm for cascaded second order adaptive notch filters implemented using an allpass filter, for elimination of multiple sinusoids. Each of the stages of the notch filter is implemented using direct form second order allpass filter. We also present an analysis which compares the proposed algorithm with the conventional simplified algorithm, and which indicates that the proposed algorithm has a reduced bias in the estimation of the multiple input sinusoids. Simulation results that have been provided confirm this analysis.

  • A Metadata Prefetching Mechanism for Hybrid Memory Architectures Open Access

    Shunsuke TSUKADA  Hikaru TAKAYASHIKI  Masayuki SATO  Kazuhiko KOMATSU  Hiroaki KOBAYASHI  

     
    PAPER

      Pubricized:
    2021/12/03
      Vol:
    E105-C No:6
      Page(s):
    232-243

    A hybrid memory architecture (HMA) that consists of some distinct memory devices is expected to achieve a good balance between high performance and large capacity. Unlike conventional memory architectures, the HMA needs the metadata for data management since the data are migrated between the memory devices during the execution of an application. The memory controller caches the metadata to avoid accessing the memory devices for the metadata reference. However, as the amount of the metadata increases in proportion to the size of the HMA, the memory controller needs to handle a large amount of metadata. As a result, the memory controller cannot cache all the metadata and increases the number of metadata references. This results in an increase in the access latency to reach the target data and degrades the performance. To solve this problem, this paper proposes a metadata prefetching mechanism for HMAs. The proposed mechanism loads the metadata needed in the near future by prefetching. Moreover, to increase the effect of the metadata prefetching, the proposed mechanism predicts the metadata used in the near future based on an address difference that is the difference between two consecutive access addresses. The evaluation results show that the proposed metadata prefetching mechanism can improve the instructions per cycle by up to 44% and 9% on average.

  • A Light-Weight Rollback Mechanism for Testing Kernel Variants in Auto-Tuning

    Shoichi HIRASAWA  Hiroyuki TAKIZAWA  Hiroaki KOBAYASHI  

     
    PAPER-Software

      Pubricized:
    2015/09/15
      Vol:
    E98-D No:12
      Page(s):
    2178-2186

    Automatic performance tuning of a practical application could be time-consuming and sometimes infeasible, because it often needs to evaluate the performances of a large number of code variants to find the best one. In this paper, hence, a light-weight rollback mechanism is proposed to evaluate each of code variants at a low cost. In the proposed mechanism, once one code variant of a target code block is executed, the execution state is rolled back to the previous state of not yet executing the block so as to repeatedly execute only the block to find the best code variant. It also has a feature of terminating a code variant whose execution time is longer than the shortest execution time so far. As a result, it can prevent executing the whole application many times and thus reduces the timing overhead of an auto-tuning process required for finding the best code variant.

  • Energy-Performance Modeling of Speculative Checkpointing for Exascale Systems

    Muhammad ALFIAN AMRIZAL  Atsuya UNO  Yukinori SATO  Hiroyuki TAKIZAWA  Hiroaki KOBAYASHI  

     
    PAPER-High performance computing

      Pubricized:
    2017/07/14
      Vol:
    E100-D No:12
      Page(s):
    2749-2760

    Coordinated checkpointing is a widely-used checkpoint/restart protocol for fault-tolerance in large-scale HPC systems. However, this protocol will involve massive amounts of I/O concentration, resulting in considerably high checkpoint overhead and high energy consumption. This paper focuses on speculative checkpointing, a CPR mechanism that allows for temporal distribution of checkpointings to avoid I/O concentration. We propose execution time and energy models for speculative checkpointing, and investigate energy-performance characteristics when speculative checkpointing is adopted in exascale systems. Using these models, we study the benefit of speculative checkpointing over coordinated checkpointing under various realistic scenarios for exascale HPC systems. We show that, compared to coordinated checkpointing, speculative checkpointing can achieve up to a 11% energy reduction at the cost of a relatively-small increase in the execution time. In addition, a significant energy-performance trade-off is expected when the system scale exceeds 1.2 million nodes.

  • Simplification of Liquid Dielectric Property Evaluation Based on Comparison with Reference Materials and Electromagnetic Analysis Using the Cut-Off Waveguide Reflection Method

    Kouji SHIBATA  Masaki KOBAYASHI  

     
    PAPER

      Vol:
    E100-C No:10
      Page(s):
    908-917

    In this study, expressions were compared with reference material using the coaxial feed-type open-ended cut-off circular waveguide reflection method to support simple and instantaneous evaluation of dielectric constants in small amounts of scarce liquids over a broad frequency range. S11 values were determined via electromagnetic analysis for individual jig structure conditions and dielectric property values without actual S11 measurement under the condition that the tip of the measurement jig with open and short-ended conditions and with the test material inserted. Next, information on the relationships linking jig structure, dielectric properties and S11 properties was stored on a database to simplify the procedure and improve accuracy in reference material evaluation. The accuracy of the estimation formula was first theoretically verified for cases in which values indicating the dielectric properties of the reference material and the actual material differed significantly to verify the effectiveness of the proposed method. The results indicated that dielectric property values for various liquids measured at 0.5 and 1.0GHz using the proposed method corresponded closely to those obtained using the method previously proposed by the authors. The effectiveness of the proposed method was evaluated by determining the dielectric properties of certain liquids at octave-range continuous frequencies between 0.5 and 1.0GHz based on interpolation from limited data of several frequencies. The results indicated that the approach enables quicker and easier measurement to establish the complex permittivity of liquids over a broad frequency range than the previous method.

  • Quantized Decoder Adaptively Predicting both Optimum Clock Frequency and Optimum Supply Voltage for a Dynamic Voltage and Frequency Scaling Controlled Multimedia Processor

    Nobuaki KOBAYASHI  Tadayoshi ENOMOTO  

     
    PAPER-Electronic Circuits

      Vol:
    E101-C No:8
      Page(s):
    671-679

    To completely utilize the advantages of dynamic voltage and frequency scaling (DVFS) techniques, a quantized decoder (QNT-D) was developed. The QNT-D generates a quantized signal processing quantity (Q) using a predicted signal processing quantity (M). Q is used to produce the optimum frequency (opt.fc) and the optimum supply voltage (opt.VD) that are proportional to Q. To develop a DVFS controlled motion estimation (ME) processor, we used both the QNT-D and a fast ME algorithm called A2BC (Adaptively Assigned Breaking-off Condition) to predict M for each macro-block (MB). A DVFS controlled ME processor was fabricated using 90-nm CMOS technology. The total power dissipation (PT) of the processor was significantly reduced and varied from 38.65 to 99.5 µW, only 3.27 to 8.41 % of PT of a conventional ME processor, depending on the test video picture.

41-60hit(60hit)